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COMPLETE LISTING OF THE CLAIMS 

Claim 1 (currently amended): An apparatus for reproducing a music sound and a voice 
sound representative, of human voice , comprising: 

a first storing section that stores a music data file containing a music part and a voice part, 
the music part containing a sequence of music generation events effective to instruct generation of 
the music sound, the voice part containing voice reproduction sequence data composed of a 
combination of voice reproduction event data and duration data, the voice reproduction event data 
being a text description type containing text information representing words to be pronounced as the 
human voice ^and prosodic symbols representing vocal expressions applied to pronunciation of the 
words, and instructing reproduction of a sequence of voice events, the duration data specifying a 
timing of effecting a voice event in terms of a duration time measured from another voice event 
preceding to the voice event; 

a control section that reads out the music data file from the first storing section; and 

a sound generator section that operates based on the music part contained in the read music 
data file for generating the music sound representative of the sequence of the music events, and that 
operates based on the voice part contained in the read music data file for generating the voice sound 
representative of the sequence of the viee voice events, thereby mixing and outputting the music 
sound and the voice sound. 

Claim 2 (canceled) 
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Claim 3 (currently amended): The apparatus according to claim 1, further comprising a 
second storing section that stores first dictionary data which records correspondence between the 
text information representing words to be pronounced as the human voice sound and phoneme 
information representing phonemes of the words, and correspondence between prosodic symbols 
representing vocal expressions applied to pronunciation of the words and the prosodic control 
information for controlling the vocal expressions, and a third storing section that stores second 
dictionary data which records correspondence between a combination of the phoneme information 
and associated prosodic control information representing the voice sound to be reproduced, and 
formant control information used for generating formants of the voice sound, wherein the control 
section reads out the music data file having the voice part containing the voice reproduction event 
data of a the text description type which instructs r e production of th e voice sound r e pres e nt e d by th e 
text information and a s sociat e d prosodic symbols , then the control section refers to the first 
dictionary data stored in the second storing section for acquiring therefrom the phoneme 
information and associated prosodic control information corresponding to the text information and 
associated prosodic symbols, and further refers to the second dictionary data stored in the third 
storing section for reading out therefrom the formant control information corresponding to the 
acquired phoneme information and associated prosodic control information, so that the sound 
generator section operates based on the read formant control information for generating the voice 
sound. 
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Claim 4 (canceled) 

Claim 5 (currently amended): The apparatus according to claim 1, wherein tho first storing 
section stor e s th e music data file containing th e voice part of a first format typ e , the sound generator 
section is operable based on the a voice part of a s e cond another format type for generating the 
voice sound, said another format of the voice part containing voice reproduction event data of a 
different description type than the voice reproduction event data of the text description type, and the 
control section d e tects a format type of the voic e part r e ad from th e first storing s e ction and op e rat e s 
if th e d e t e cted first formant typ e of th e voic e part is not compatibl e with th o s e cond format typ e for 
converting the voice reproduction event data of the text description type contained in the read voice 
part from th e first format typ e to the s e cond format typ e voice reproduction event data of the 
different description type , thereby enabling the sound generator section. 

Claim 6 (currently amended): The apparatus according to claim 5, further comprising a 
second storing section that stores dictionary data required for conversion of the format type of voice 
reproduction event data of the text description type contained in the voice part of the music data file, 
so that the control section refers to the dictionary data stored in the second storing section for 
effecting the conversion of the format typ e of voice reproduction event data of the text description 
type contained in the read voice part. 

Claim 7 (original): The apparatus according to claim 1, wherein the voice part of the music 
data file contains data specifying a kind of language of the voice part. 
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Claim 8 (canceled) 

Claim 9 (currently amended): A memory medium for storing voice reproduction sequence 
data designed for causing a sound generator device to reproduce a human voice, wherein 

the voice reproduction sequence data has a chunk structure composed of a content 
information chunk containing information for managing the voice reproduction sequence data and at 
least one track chunk containing voice sequence data, and wherein 

the voice sequence data comprises a sequence of pairs of voice reproduction event data and 
duration data, the voice reproduction event data instructing a voice reproduction event of the human 
voice, the duration data specifying a timing of executing the voice reproduction event in terms of a 
duration time measured from a preceding voice reproduction event , and wherein 

the voice reproduction event data is one of a text description type, a phoneme description 
type and a formant frame description type, the text description type of the voice reproduction event 
data containing text information specifying words to be pronounced by the sound generator device 
as the human voice. and associated prosodic symbols specifying vocal expression applied to 
pronunciation of the words, the phoneme description type of the voice reproduction event data 
containing phoneme information specifying phonemes of the human voice to be reproduced by the 
sound generator device and associated prosodic control information controlling vocal expressions of 
the phonemes, the formant frame description type of the voice reproduction event data containing 
formant control information specifying formants of the human voice at respective time frames. 
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Claim 10 (canceled) 

Claim 1 1 (currently amended): A memory medium for storing sequence data for causing a 
sound generator device to reproduce a music sound and a human voice, wherein the sequence data 
has a data structure composed of music sequence data and voice reproduction sequence data, 

the music sequence data comprising a sequence of pairs of music generation event data and 
duration data, the music generation event data instructing a music generation event of the music 
sound, and the duration data specifying a timing of executing the music generation event in terms of 
a duration time measured from a preceding music generation event, and 

the voice reproduction sequence data comprising a sequence of pairs of voice reproduction 
event data and duration data, the voice reproduction event data instructing a voice reproduction 
event of the human voice, and the duration data specifying a timing of executing the voice 
reproduction event in terms of a duration time measured from a preceding voice reproduction event, 
whereby the music sequence data and the voice reproduction sequence data are concurrently 
processed by the sound generator device so as to reproduce the music sound and the human voice 
along a common time axis , wherein 

the voice reproduction event data is one of a text description type, a phoneme description 
type and a formant frame description type, the text description type of the voice reproduction event 
data containing text information specifying words to be pronounced by the sound generator device 
as the human voice and associated prosodic symbols specifying vocal expression applied to 
pronunciation of the words, the phoneme description type of the voice reproduction event data 
containing phoneme information specifying phonemes of the human voice to be reproduced by the 
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sound generator device and associated prosodic control information controlling vocal expressions of 
the phonemes, the formant frame description type of the voice reproduction event data containing 
formant control information specifying formants of the human voice at respective time frames. 

Claim 12 (original): The memory medium according to claim 11, wherein the sequence data 
has a chunk structure such that the music sequence data and the voice reproduction sequence data 
are arranged at different chunks. 

Claim 13 (canceled) 

Claims 14 (currently amended): A server apparatus comprises a storing section and a 
transmitting section, wherein 

the storing section stores a music data file containing a music part and a voice part, the 
music part containing a sequence of music generation events effective to instruct generation of the 
music sound, the voice part containing voice reproduction sequence data composed of a 
combination of voice reproduction event data and duration data, the voice reproduction event data 
instructing reproduction of a sequence of voice events, the duration data specifying a timing of 
effecting a voice event in terms of a duration time measured from another voice event preceding to 
the voice event, and 

the transmitting section responds to a request from a client terminal apparatus for 
distributing the stored music data file to the client terminal apparatus , and wherein 
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the voice reproduction event data is one of a text description type, a phoneme description 
type and a formant frame description type, the text description type of the voice reproduction event 
data containing text information specifying words to be pronounced by the sound generator device 
as the human voice and associated prosodic symbols specifying vocal expression applied to 
pronunciation of the words, the phoneme description type of the voice reproduction event data 
containing phoneme information specifying phonemes of the human voice to be reproduced by the 
sound generator device and associated prosodic control information controlling vocal expressions of 
the phonemes, the formant frame description type of the voice reproduction event data containing 
formant control information specifying formants of the human voice at respective time frames. 

Claim 15 (canceled) 
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Claim 16 (currently amended): A method of controlling a music apparatus having a data 
storage and a sound generator for reproducing a music sound and a voice sound representative of a 
human voice , the method comprising the steps of: 

storing a music data file containing a music part and a voice part in the data storage, the 
music part containing a sequence of music generation events effective to instruct generation of the 
music sound, the voice part containing voice reproduction sequence data composed of a 
combination of voice reproduction event data and duration data, the voice reproduction event data 
being a text description type containing text information representing words to be pronounced as the 
human voice and prosodic symbols representing vocal expressions applied to pronunciation of the 
words, and instructing reproduction of a sequence of voice events, the duration data specifying a 
timing of effecting a voice event in terms of a duration time measured from another voice event 
preceding to the voice event; 

reading out the music data file from the data storage; 

operating the sound generator based on the music part contained in the read music data file 
for generating the music sound representative of the sequence of the music events, and 

operating the sound generator based on the voice part contained in the read music data file 
for generating the voice sound representative of the sequence of the vice events, thereby mixing and 
outputting the music sound and the voice sound. 
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Claim 17 (currently amended): A computer program for use in a music apparatus having a 
data storage and a sound generator, the computer program being executable in the music apparatus 
for performing a method of reproducing a music sound and a voice sound representative of a human 
voice , wherein the method comprises the steps of: 

storing a music data file containing a music part and a voice part in the data storage, the 
music part containing a sequence of music generation events effective to instruct generation of the 
music sound, the voice part containing voice reproduction sequence data composed of a 
combination of voice reproduction event data and duration data, the voice reproduction event data 
being a text description type containing text information representing words to be pronounced as the 
human voice and prosodic symbols representing vocal expressions applied to pronunciation of the 
words, and instructing reproduction of a sequence of voice events, the duration data specifying a 
timing of effecting a voice event in terms of a duration time measured from another voice event 
preceding to the voice event; 

reading out the music data file from the data storage; 

operating the sound generator based on the music part contained in the read music data file 
for generating the music sound representative of the sequence of the music events, and 

operating the sound generator based on the voice part contained in the read music data file 
for generating the voice sound representative of the sequence of the vice events, thereby mixing and 
outputting the music sound and the voice sound. 



la-888629 



Application No.: 10/715,921 11 Docket No.: 3930320420500 

Claim 18 (new): An apparatus for reproducing a voice sound representative of a human 
voice, said apparatus comprising: 

a first storing section that stores a data file containing voice reproduction event data that is a 
text description type containing text information representing words to be pronounced as the human 
voice and prosodic symbols representing vocal expressions applied to pronunciation of the words, 
and which instructs reproduction of a sequence of voice events; 

a second storing section that stores first dictionary data that records correspondence 
between the text information representing words to be pronounced as the human voice and 
phoneme information representing phonemes of the words, and correspondence between the 
prosodic symbols representing vocal expressions applied to pronunciation of the words and 
prosodic control information for controlling the vocal expressions; 

a third storing section that stores second dictionary data that records correspondence 
between a combination of the phoneme information and associated prosodic control information 
representing the human voice to be reproduced, and formant control information used for 
generating formants of the human voice; 

a control section that reads out the data file containing the voice reproduction event data of 
the text description type, then refers to the first dictionary data stored in the second storing section 
for acquiring therefrom the phoneme information and associated prosodic control information 
corresponding to the text information and associated prosodic symbols, and further refers to the 
second dictionary data stored in the third storing section for reading out therefrom the formant 
control information corresponding to the acquired phoneme information and associated prosodic 
control information; and 
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a sound generator section that operates based on the read formant control information for 
generating the voice sound representative of the sequence of the voice events. 

Claim 19 (new): An apparatus for reproducing a voice sound representative of a human 
voice, comprising: 

a storing section that stores a data file containing voice reproduction sequence data 
composed of a combination of voice reproduction event data and duration data, the voice 
reproduction event data instructing reproduction of a sequence of voice events, the duration data 
specifying a timing of effecting a voice event in terms of a duration time measured form another 
voice event preceding to the voice event, wherein the voice reproduction event data is one of a text 
description type, a phoneme description type and a formant frame description type, the text 
description type of the voice reproduction event data containing text information specifying words 
to be pronounced as the human voice and associated prosodic symbols specifying vocal expression 
applied to pronunciation of the words, the phoneme description type of the voice reproduction 
event data containing phoneme information specifying phonemes of the human voice to be 
reproduced and associated prosodic control information controlling vocal expressions of the 
phonemes, the formant frame description type of the voice reproduction event data containing 
formant control information specifying formants of the human voice at respective time frames; 

a control section that reads out the data file from the storing section and processes the read 
data file; and 
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a sound generator that operates based on the voice production sequence data contained in 
the processed data file for generating the voice sound representative of the sequence of the voice 
events based on the voice reproduction event data at the timing specified by the duration data. 
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